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Abstract 

Communication networks have evolved from specialized, research and tactical transmission systems 
to large-scale and highly complex interconnections of intelligent devices, increasingly becoming more 
commercial, consumer-oriented, and heterogeneous. Propelled by emergent social networking services and 
high-definition streaming platforms, network traffic has grown explosively thanks to the advances in pro- 
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■ cessing speed and storage capacity of state-of-the-art communication technologies. As "netizens" demand 

■ a seamless networking experience that entails not only higher speeds, but also resilience and robustness 
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to failures and malicious cyber-attacks, ample opportunities for signal processing (SP) research arise. The 
vision is for ubiquitous smart network devices to enable data-driven statistical learning algorithms for 



Ov 

■ distributed, robust, and online network operation and management, adaptable to the dynamically-evolving 

network landscape with minimal need for human intervention. The present paper aims at delineating the 
analytical background and the relevance of SP tools to dynamic network monitoring, introducing the 
SP readership to the concept of dynamic network cartography - a framework to construct maps of the 
dynamic network state in an efficient and scalable manner tailored to large-scale heterogeneous networks. 

^ ■ I. Introduction 

O . 

Emergence of multimedia-enriched social networking services and Internet-friendly portable devices is 
multiplying network traffic volume day by day [53]. Wireless connectivity under the envisioned dynamic 
spectrum paradigm [29] relies on mobile networks of diverse nodes, which are nevertheless united by 
^vqj \ unparalleled cognition capabilities, adaptability, and decision-making attributes. Moreover, the advent of 
networks of intelligent devices such as those deployed to monitor the smart power grid, transportation 
networks, medical information networks, and cognitive radio (CR) networks, will transform the commu- 
nication infrastructure to an even more complex and heterogeneous one. Thus, ensuring compliance to 
service-level agreements and quality-of-service (QoS) guarantees necessitates breakthrough management 
and monitoring tools providing operators with a comprehensive view of the network landscape. Situational 
awareness provided by such tools will be the key enabler for effective information dissemination, routing 
and congestion control, network health management, risk analysis, and security assurance. 

But this great promise comes with great challenges. Acquiring network-wide performance and utilization 
metrics for large networks is no easy task. Suppose for instance that traffic volumes are of interest, not 
only for gauging instantaneous network health, but also for more complex network management tasks 
such as intrusion detection, capacity provisioning, and network planning [56]. While traffic volumes on 
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links (also called link counts) are readily acquired using off-the-shelf tools such as the simple network 
management protocol (SNMP), missing link-count measurements may still skew the network operator's 
perspective. SNMP packets may be dropped for instance, if some links become congested, rendering link- 
count information for those links more important, as well as less available [47], [49]. Classical approaches 
relying either on simple time-series interpolation or on regularized least-squares (LS) formulations for 
predicting the missing link counts [50], have not been able to fully capture the complexity of the Internet 
traffic. This is evidenced by the recent upsurge of efforts toward advanced network tomography [13], and 
spatio-temporal traffic estimation algorithms for network monitoring [26], [49], [56]. 

Similarly, path metrics such as end-to-end delays are of great interest to service providers because they 
directly affect the end-user experience. The challenge here is that the number of paths grows very fast as the 
number of nodes increases. Probing exhaustively all origin-destination pairs is impractical and wasteful 
of resources even for moderate-size networks [17], [48]. Accurate prediction of missing delays based 
on the inherent e.g., topology-induced correlation or smoothness traits among link and path quantities 
is therefore crucial for statistical analysis and monitoring tasks [32]. While the prevailing operational 
paradigm adopted in current networks entails nodes continuously communicating their link measurements 
to a central monitoring station, in-network distributed cooperation through local interactions is preferred 
for scalability and robustness considerations [38]. 

Conventional network monitoring tools entail a couple of additional limitations. First, they are typically 
resource heavy and tend to overload network operators with crude, unrefined data, without enough 
processing to separate the "data wheat from the chaff"; see e.g., [19] and references therein. It is thus 
of paramount importance to construct parsimonious descriptors of the network state, for the purpose of 
modeling, monitoring, and management of complex interconnected systems. Due to the diversity of modern 
networks, the network state can incorporate typical quantities such as traffic volumes and end-to-end delays, 
as well as latent social metrics such as hierarchy, reputation, and vulnerability. Second, malicious activities 
intended to undermine network functionality or compromise secrecy of data have grown in sophistication, 
thus rendering traditional signature -based intrusion detection schemes increasingly obsolete. Intrusion 
attempts and malicious attacks manifest themselves as abrupt changes in network states [5], and such 
anomalous patterns are oftentimes hidden within the raw high-dimensional network data [55]. For these 
reasons, unveiling network anomalies in a reliable and computationally-efficient manner is a challenging 
yet essential goal [33], [38], [55]. 

All in all, accurate network diagnosis and statistical analysis tools are instrumental for maintaining 
seamless end-user experience in dynamic environments, as well as for ensuring network security and 
stability. In this direction, this tutorial advocates the concept of dynamic network cartography as a tool 
for statistical modeling, monitoring, and management of complex networks. Focus will be placed on two 
complementary aspects of network cartography, namely, online construction of global network state maps 
using only a few measurements, and unveiling of network anomalies across network flows and time. 
The surveyed cartography algorithms leverage recent advances in machine learning and statistical signal 
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processing (SP) methods, including sparsity-cognizant learning, kriged Kalman filtering of dynamical 
processes over networks, nuclear norm minimization for low-rank matrix completion, semi-supervised 
dictionary learning, and in-network optimization via the alternating-directions method of multipliers. 
Through a unifying treatment that revolves around network cartography, this paper demonstrates how 
benefits from foundational SP methods can permeate to dynamic network monitoring, and collectively 
enable inference of global network health, thus leading to enhanced network robustness and QoS. 

II. Global performance prediction via dynamical network cartography 

This section deals with the problem of mapping the network state from incomplete sets of measurements, 
and touches upon two application domains. A dictionary learning algorithm is introduced first to efficiently 
impute missing link traffic volumes, using measurements from a wide class of (possibly non-stationary) 
traffic patterns [26]. Subsequently, the problem of tracking and predicting end-to-end network delay is 
considered, and the dynamic network kriging approach of [45] is described. 

A. Semi-supervised dictionary learning for traffic maps 

Consider an Internet protocol (IP) network comprising N nodes and L links, carrying the traffic of F 
origin-destination flows (network connections). Let x^t denote the traffic volume (in bytes or packets) 
passing through link I G {1, . . . , L} over a fixed interval of time (t, t + At). Link counts across the entire 
network are collected in the vector x t G IR L , e.g., using the ubiquitous SNMP protocol. Since measured 
link counts are both unreliable and incomplete due to hardware or software malfunctioning, jitter, and 
communication errors [47], [56], they are expressed as noisy versions of a subset of S < L links 



where St is an S x L selection matrix with 0- 1 entries whose rows correspond to rows of the identity matrix 
of size L, and e t is an S x 1 zero-mean noise term with constant variance accounting for measurement 
and synchronization errors. Given y t the aim is to form an estimate x t of the full vector of link counts 
x t , which in this case defines the network state. 

A simple approach implemented in measurement-processing software such as RRDtool [43], is to 
ignore the noise term and rely on one-dimensional interpolation for the time series {x^ t } per link I. 
The applicability and accuracy of this scheme is however limited, since it tacitly assumes that the entries 
of x t are uncorrected; missing entries x^ t are few and do not occur in bursts; and the time series {x t } 
is stationary. Nevertheless, none of these assumptions holds true in real networks [47]. 

The reliance on stationarity and availability of measurements from contiguous time intervals can be 
forgone if estimation of x 4 is performed for each t individually. In principle, x t can be obtained if the 
volumes of origin-destination (OD) traffic flows x t € M F are available, since they are related through 



y t = S t xt + €t, t = l,2, ... 



(1) 



x t = Rzf 



(2) 
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where the so-termed routing matrix R := [nj] G {0, l} LxF is such that rij = 1 if link I carries the flow 
/, and zero otherwise. However, measuring z t is even more difficult and in practice z t is itself estimated 
from {xt} through tomographic traffic inference [13], [32], where given R and noisy link counts, the 
goal is to estimate the OD flows as the solution of a linear inverse problem. Since the inverse problem 
is highly under-determined [F = 0{N 2 ) ^> L = O(N)], early approaches relied on prior knowledge in 
the form of statistical models for the OD flows (such as the Poisson, Gaussian, logit-choice, or gravity 
models), that ultimately serve as complexity-controlling (that is regularization) mechanisms [32, Ch. 9]. 
Among these, the state-of-the-art traffic matrix estimation algorithm uses an entropy-based regularizer, 
and has been shown to be fast, accurate, robust, and flexible [54]. Time-series analysis-based approaches 
(such as the Kalman filter in [50]) have also been proposed for scenarios where link-count measurements 
are available over contiguous time slots. 

Recently, a link-count prediction algorithm was put forth in [26], where missing entries of x t are 
estimated from historical measurements in Ts ■= {yt}f=\ by leveraging the structural regularity of R 
through a semi-supervised dictionary learning (DL) approach. Under the DL framework, data-driven 
dictionaries for sparse signal representation are adopted as a versatile means of capturing parsimonious 
signal structures; see e.g., [52] for a tutorial treatment. Propelled by the success of compressive sampling 
(CS) [23], sparse signal modeling has led to major advances in several machine learning, audio and 
image processing tasks [51], [52]. Motivated by these ideas, it is postulated in [26] that link counts 
can be represented as a linear combination x t = Bw t of a few (^C Q) columns of an over-complete 
dictionary (basis) matrix B := [bi, . . . ,\jq] € IR Lx( 2, where w t € M9 is a sparse vector of expansion 
coefficients. Many signals including speech and natural images admit sparse representations even under 
generic predefined dictionaries, such as those based on the Fourier and the wavelet bases, respectively [52]. 
Like audio and natural images, link counts can exhibit strong correlations as evidenced from the structure 
of R [cf. (2)]. For instance, the traffic volumes on links i and j are highly correlated if they both carry 
common flows. DL schemes are attractive due to their flexibility, since they utilize training data to learn 
an appropriate over-complete basis customized for the data at hand. However, the use of DL for modeling 
network data is well motivated but so far relatively unexplored. 

Prediction of link counts. Suppose for now that either a learnt, or, a suitable pre-specified dictionary 
B is available, and consider predicting the missing link counts. Data-driven learning of dictionaries from 
historical data will be addressed in the ensuing subsection. Given R and the link count measurements 
y t , contemporary tools developed in the area of CS and semi-supervised learning can be used to form 
x t , which includes estimates for the missing L — S link counts [8], [23], [51]. The spatial regularity of 
the link counts is captured through the auxiliary weighted graph Q with L vertices, one for each link in 
the network. The edge weights for all edges in Q are subsumed by the off-diagonal entries of the Gram 
matrix G = \g%J[ ■= RR' € M ixL , where (•)' denotes transposition. The off-diagonal entries gij count 
the number of OD flows that are common to both links i and j. Main diagonal entries of G count the 
number of OD flows that use the corresponding links. 
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Given a snapshot of incomplete link counts y t during the operational phase (where a suitable basis B 
is available), the sparse basis expansion coefficient vector w t is estimated as 

wj := argmin ||y t - S t Bw t ||2 + A^w^Hi + A 9 w£B'LBwt (3) 

where L := diag(Gli) — G denotes the Laplacian matrix of Q; X w ,X g > are tunable regularization 
parameters; and I/, is the L x 1 vector of all ones. The criterion in (3) consists of a LS error between 
the observed and postulated link counts, along with two regularizers. The ^i-norm 1 1 w< 1 1 1 encourages 
sparsity in the coefficient vector w t [23], [51]. With x t := [xij, ■ ■ ■ , XL,t]' given by x 4 = Bw t , the 
Laplacian regularization can be explicitly written as w£B'LBw t = (1/2) J2i=i X^=i 9i ,j ( x i,t ~ x j,t) 2 ■ It 
is thus apparent that w£B'LBw t encourages the link counts to be close if their corresponding vertices 
are connected in Q. Each summand is weighted according to the number of OD flows common to links i 
and j. Typically adopted for semi-supervised learning, such a regularization term encourages Bwj to lie 
on a smooth manifold approximated by Q, which constrains how the measured link counts relate to x t 
[8], [44]. It is also common to use normalized variants of the Laplacian instead of L [32, p. 46]. 

The cost in (3) is convex but non-smooth, and customized solvers developed for ^i-norm regularized 
optimization can be employed here as well, e.g., [27]. Once w t is available, an estimate of the full vector 
of link counts is readily obtained as x t := Bwj. It is apparent that the quality of the imputation depends 
on the chosen B, and DL from historical network data in Ts is described next. 

Data-driven dictionary learning. In its canonical form, DL seeks a (typically fat) dictionary B so that 
training data 71 := {^t}J=\ are well approximated as x t k, Bw t , t = 1, . . . , T, for some sparse vectors 
wt of expansion coefficients [52]. Standard DL algorithms cannot, however, be directly applied to learn B 
since they rely on the entire vector x t . To learn the dictionary in the training phase using incomplete link 
counts Ts instead of 71, the idea is to capitalize on the structure in x t , of which Q is an abstraction [26]. 
To this end, one can adopt a similar cost function as in the operational phase [cf. (3)], yielding the 
data-driven basis and the corresponding sparse representation 

T 

{W,B}:= argmin ^ [||y t - S t Bw t ||l+A t< ,||w t ||i+A fl w / tB / I'Bw t ] (4) 

W,B:{||b,||2<l}? =1 t=l 

where W := [wi, . . . , wj] € M9 xT . The constraints {||b 9 ||2 < l}^=i remove the scaling ambiguity in the 
products Bw t , and prevent the entries in B from growing unbounded. Again, the combined regularization 
terms in (4) promote both sparsity in w t through the £i-norm, and smoothness across the entries of Bw t 
via the Laplacian L. The regularization parameters A^ and X g are typically cross-validated [27], [51]. 
Although (4) is non-convex, a block coordinate-descent (BCD) solver still guarantees convergence to a 
stationary point [9]. The BCD updates involve solving for B and W in an alternating fashion, both doable 
efficiently via convex programming [26]. Alternatively, the online DL algorithm in [36] offers enhanced 
scalability by sequentially processing the data in Ts- The training and operational (prediction) phases are 
summarized in Fig. 1, where C((B,w) denotes the i-th summand from the cost in (4). 

The explicit need for Laplacian regularization is apparent from (4). Indeed, if measurements from a 
certain link are not present in Ts, the corresponding row of B may still be estimated with reasonable 
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t=i 



yt, * > r 



minC t (B,w t ) 



B[fc + 1] 



Xf = Bw t 



Fig. 1. Training and operational phases of the semi-supervised DL approach for link-traffic cartography in [26]. 

accuracy because of the third term in Ct(B,w). On top of that, it is because of Laplacian regularization 
that the prediction performance degrades gracefully as the number of missing entries in yt increases; see 
also Fig. 2. It is worth stressing that the time series {y^} need not be stationary or even contiguous in time. 
The link-traffic cartography approach described so far can also be adapted to accommodate time-varying 
network topologies or routing matrices, using a time-dependent Laplacian L|. A word of caution is due 
however, since drastic changes in either Lt or in the statistical properties of the underlying OD flows zt, 
will necessitate re-training B to attain satisfactory performance. Finally, note that DL techniques incur 
a complexity at least cubic in the size of the network, and are better suited for monitoring of backbone 
wide-area networks which are typically not very large. 

Next, a numerical test on link count data from the Internet2 measurement archive [1] is outlined. The 
data consists of link counts, sampled at 5 minute intervals, collected over several weeks. For the purposes of 
comparison, the training phase consisted of 2000 time slots, with a random subset of 50 links measured (out 
of L = 54 per time slot. The performance of the learned dictionary is then assessed over the next To = 2000 
time slots. Each test vector y t is constructed by randomly selecting S entries of the full link count vector 
Xj. The tuning parameters are chosen via cross-validation (A s = 0.1 and \ g = 10 -5 ). Fig. 2 shows the 
normalized reconstruction error (NRE), evaluated as (LTq)^ 1 Y2t=i IWt ~ x t|| 2 f° r different values of Q 
and S. For comparison, the prediction performance with a fixed diffusion wavelet matrix [18] (instead of 
the data-trained dictionary), as well as that of the entropy-penalized LS method [54] is also shown. The 
latter approach solves a LS problem augmented with a specific entropy-based regularizer, that encourages 
the traffic volumes at the source/destination pairs to be stochastically independent. The DL-based method 
markedly outperforms the competing approaches, especially for low values of S. Furthermore, note how 
performance degrades gracefully as S decreases. Remarkably, the predictions are close to the actual traffic 
even when using only 30 link counts during the prediction phase. 
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Fig. 2. Link-traffic cartography of Internet2 data [1]. Comparison of NRE for different values of S [26]. 

B. Delay cartography via dynamic network kriging 

Instead of link counts, consider now the problem of monitoring delays d p j on a set of multihop paths 
p G V, that connect P := \V\ source-destination pairs in an IP network. Path delays are important metrics 
required by network operators for assessment, planning, and fault diagnosis [17], [32], [45]. However, 
monitoring path metrics is challenging primarily because P generally grows as the square of the number 
of nodes in the network. Therefore, at any time t delays can only be measured on a subset of paths 
St C V, collected in the vector df. Based on the partial current and past measurements H t '■= {d£}* =1 , 
delay cartography amounts to predicting the remaining path delays df := {G?p,t} p( =:p\s- 

A promising approach in this context has been the application of kriging, a tool for spatial prediction 
popular in geostatistics and environmental sciences [21]. A network kriging scheme was developed in [17], 
which advocates prediction of network-wide path delays using measurements on a fixed subset of paths. 
The class of linear predictors introduced therein leverages network topology information to model the 
covariance among path delays. Building on these ideas, a dynamic network kriging approach capable of 
real-time spatio-temporal delay predictions was put forth in [45]. Specifically, a kriged Kalman filter is 
employed to explicitly capture temporal variations due to queuing delays, while retaining the topology- 
based spatial kriging predictor. The per-path delay d Pj t comprises several independent components due to 
contributions from each intermediate link and router, and is modeled in [45] as 



The queuing delay Xp,t (collected in Xt £ depends on the traffic, and exhibits spatio-temporal 
correlation, periodic behavior as well as occasional bursts, prompting the following random walk model 



where the driving noise rj t has zero mean and covariance matrix C v . The second term in (5), collected 
in the vector v t , combines the processing, transmission, and propagation delays, and is temporally white 
but spatially correlated, owing to the overlap between paths. Similar to [17], the correlation between two 



d p ,t — Xp,t + u p,t + e p,t- 



(5) 



xt = xt-i + m 



(6) 
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paths is modeled as being proportional to the number of links they share, so that the covariance matrix 
Cj, = aUU', where, u Pt i = 1 if path p contains link I, and u p j = otherwise. Finally, the noise term 
e P: t is zero mean i.i.d. with known variance a 2 . Defining the S x P path selection matrix as in Sec. II-A, 
the measurement equation can be written as (introduce vf := SfUt and likewise ef) 

d s t = S tX t + v? + e s t . (7) 

In the absence of S t , the spatio-temporal model in (6)-(7) is widely employed in geostatistics, where 
Xt is generally referred to as trend, and v t captures the random fluctuations around Xt\ see e -g- [40]. 
Similar models have been employed in [30] to describe the dynamics of wireless propagation channels, 
and in [20] for spatio-temporal random field estimation. For a static selection matrix, i.e., St := S for 
all t, the network kriging approach [17] entails the following two-step procedure: (si) treat as noise, 
and estimate Xt using the generalized LS criterion; and (s2) use the aforesaid estimate to find the linear 
minimum mean-square error (LMMSE) estimator (denoted by E*) for v\, namely 



E* \v s t \xt\ = SC„S' (SC u S' + a% 



StXt] ■ 



(8) 



Recently, a CS-based approach has also been reported for predicting network-wide performance met- 
rics [18]. For instance, diffusion wavelets were utilized in [18] to obtain a compressible representation of 
the delays, and account for spatial and temporal correlations. Although this allows for enhanced prediction 
accuracy relative to [17], it requires batch processing of measurements which does not scale well to large 
networks for real-time operation. Pictorially, the performance of different algorithms can be assessed 
through the delay maps shown in Fig. 3. 

The spatio-temporal model set forth earlier can provide a better estimate of Xt by efficiently processing 
both present and past measurements jointly. Towards this end, a Kalman filter is employed in [45], which 
at time t yields the following update equations 

Xt ■= E* [Xt\U t ] = xt-i + K t (dJ - S t xt-i) 

M t := E [(xt ~ Xt)(Xt ~ Xt)'] = (Ip " K t S t )(M t _i + C„) 

where K t := (M t _i + C u )S' t [S t (C I/ + C v + Mi_i)S£ + a 2 ^]" 1 is the so-termed Kalman gain. The 
final predictor, referred also as the kriged Kalman filter (KKF), is given by 



df := S t xt + StC v S' t (S t C u S' t + a 2 l s ) 1 [d? - S tX t] 



and the prediction error covariance matrix is 



Mf := E 



df-df) (df-df)' 



ct 2 Is + S t 



(Mt_i + C v + C^) 1 + 



s's t 



The KKF framework for dynamic network delay cartography has several attractive features. First, the 
KKF yields the LMMSE estimate even for non-Gaussian distributed noise. The Kalman filter step also 
allows for a r-step prediction given by dt+ T = Xt, which can be useful for preemptive routing and 
congestion control algorithms, as well as for extrapolating missing measurements. Second, the KKF 
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Fig. 3. True and predicted delay map for 62 paths in the Intemet-2 dataset [1] over an interval of 100 minutes. (Top-left) True 
delays; (Top-right) network kriging [17]; (Bottom-left) difussion wavelets [18]; and (Bottom-right) KKF [45]. Delays of several 
paths change slightly around t = 80, but this change is only discernible from the delay predictions offered by KKF. Delay maps 
summarize the network state, and are useful tools aiding operational decision in network monitoring and control stations [45]. 

framework provides a metric, namely the error covariance matrix Mf, for choosing the paths to be 
measured at each t, which define the selection matrix St- In the present setting, it turns out that the 
D-optimal design metric logdetMf is monotonic and supermodular with respect to the set S [45]. 
Thus, a simple greedy algorithm with complexity 0(PS 3 ) can be employed to find the set of paths 
that are at least 63% optimal [42]; see Fig. 4. Consequently, the technique can be readily applied to large- 
scale networks since the complexity increases only linearly with P. The framework also admits related 
problem formulations such as selecting the best set of monitors (nodes) capable of measuring delay on all 
its outgoing paths. This represents a significant departure from state-of-the-art delay prediction/tracking 
methods [17], [18], where path selection is heuristic. Note that training is required to estimate the model 
parameters C v and a. To this end, empirical estimation techniques similar to those in [41] can be adapted 
to the present case. 



III. Dynamic Anomalography 

This section switches gears to anomalography, the problem of unveiling and mapping-out network traffic 
anomalies across flows and time given link-level traffic measurements. This is a crucial monitoring task 
towards engineering network traffic, since anomalies can result in congestion and limit QoS provisioning. 
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Fig. 4. Delay cartography using the NZ-AMP dataset [2], which includes path delays collected over a month for an IP network 
where P — 186 and N = 30 [45]. Normalized mean-square prediction error (NMSPE) as a function of S. (Left) Random path 
selection; and (Right) "Optimal" path selection, that is, using heuristic or approximate algorithms specified for each algorithm. 
Observe further that the performance of the KKF improves as the length of the training interval tr increases. 

A. Traffic modeling 

Consider a backbone IP network where Af and C denote the sets of nodes (routers) and physical links 
of cardinality \M\ = N and \C\ = L, respectively. The operational goal of the network is to transport a set 
of OD traffic flows F (with \F\ = F) associated with specific OD (ingress-egress router) pairs. Single- 
path routing is adopted here, meaning a given flow's traffic is carried through multiple links connecting 
the corresponding source-destination pair along a single path. Accordingly, over a discrete time horizon 
t € [1,T] the measured link counts X := [x^ t ] £ M Lxr and (unobservable) OD flow traffic matrix 
Z := [zft] £ ffi FxT , are thus related through X = RZ [cf. (2)]. Unless otherwise stated, the routing 
matrix R is assumed given, since it can be otherwise estimated using traceroute or topology inference 
algorithms [24]. It is also fat, as for backbone networks the number of OD flows is much larger than 
the number of physical links (F L). A cardinal property of the traffic matrix is noteworthy. Common 
temporal patterns across OD traffic flows in addition to their almost periodic behavior, render most rows 
(respectively columns) of the traffic matrix linearly dependent, and thus Z typically has low rank. This 
intuitive property has been extensively validated with real network data; see Fig. 5 and e.g., [33]. 

It is not uncommon for some of the OD flow rates to experience unexpected abrupt changes. These so- 
termed traffic volume anomalies are typically due to (unintentional) network equipment misconfiguration 
or outright failure, unforeseen behaviors following routing policy modifications, or, cyberattacks (e.g., 
DoS attacks) which aim at compromising the services offered by the network [33], [55]. Let aj t denote 
the unknown amount of anomalous traffic in flow / at time t. Explicitly accounting for the presence of 
anomalous flows, the measured traffic carried by link I is then given by yi t = Xl/eJ 7 r l,f( z f,t + a f,t) + 
en, t = 1,...,T, where the noise variables ejf capture measurement errors and unmodeled dynamics. 
Traffic volume anomalies are (unsigned) sudden changes in OD flow's traffic, and as such their effect can 
span multiple links in the network. A key difficulty in unveiling anomalies from link-level measurements 
only is that oftentimes, clearly discernible anomalous spikes in the flow traffic can be masked through 
"destructive interference" of the superimposed OD flows [33]. An additional challenge stems from missing 
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Fig. 5. Volumes of 6 representative (out of 121 total) OD flows, taken from the operation of Internet-2 during a seven-day 
period [1]. Temporal periodicities and correlations across flows are apparent. As expected, in this case Z can be well approximated 
by a low-rank matrix, since its normalized singular values decay rapidly to zero. 

link-level measurements y^ t , an unavoidable operational reality affecting most traffic engineering tasks that 
rely on (indirect) measurement of traffic matrices [47], [56]. To model missing link measurements, collect 
the tuples (l,t) associated with the available observations yi )t in the set Q C [1,2, ...,L] x [1,2, ...,T]. 
Introducing the matrices Y := [y/,i],E := [ei >t ] € R LxT , and A := [afj] G R FxT , the (possibly 
incomplete) set of link-traffic measurements can be expressed in compact matrix form as 

VnOO =Pn(X + RA + E) (9) 

where the sampling operator Vq(.) sets the entries of its matrix argument not in $7 to zero, and keeps the 
rest unchanged. Since the objective here is not to estimate the OD flow traffic matrix Z, (9) is expressed 
in terms of the nominal (anomaly-free) link-level traffic rates X, which inherits the low-rank property of 
Z. Anomalies in A are expected to occur sporadically over time, and last for a short time relative to the 
(possibly long) measurement interval [1,T]. In addition, only a small fraction of the flows is supposed to 
be anomalous at a any given time instant. This renders the anomaly traffic matrix A sparse across both 
rows (flows) and columns (time). 

B. Unveiling anomalies via sparsity and low rank 

Given link-level traffic measurements Vq(Y) adhering to (9), dynamic anomalography is a critical 
network monitoring task that aims at accurately estimating the anomaly matrix A. As argued next, 
capitalizing on the sparsity of A and the low-rank property of X will be instrumental in achieving 
this ambitious goal. From a network cartography vantage point, the resultant estimated map A offers a 
depiction of the network's "health state" along both the flow and time dimensions. If \cif t t\ > 0, the f-th 
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flow at time t is deemed anomalous, otherwise it is healthy. This joint estimation-detection task not only 
allows one to identify the time of the anomaly in addition to the affected flows, but also to estimate its 
magnitude which hints to the importance of the anomaly event. By examining R the network operator 
can immediately determine the links carrying the anomalous flows. Subsequently, planned contingency 
measures involving traffic-engineering algorithms can be implemented to address network congestion. 

The low-rank property of the traffic matrix Z (and X) is at the heart of the seminal network anomaly 
detection approach in [33]. In the absence of missing data, the method therein adopts principal component 
analysis (PC A) to decompose the link traffic Y = [yi, . . . , yr] into nominal and anomalous components 
(also known as modeled and residual traffic). For instance, if most of the variance in Y is captured by 
r <C min(L, T) dominant principal components, then by construction the nominal subspace S n is spanned 
by the r dominant right singular vectors of Y' (cf. the low rank assumption). Naturally, the anomalous 
subspace S a corresponds to the orthogonal complement, i.e., S a := S^ - In the operational phase, an 
anomaly is declared at time t when ||Ps o yt||2 exceeds a given threshold, where Ps a is an orthogonal 
projection matrix onto S a . Subsequently, a single anomalous flow is identified after running a greedy 
algorithm, and an estimate of the amount of anomalous traffic is obtained as a byproduct. Likewise, 
the spatial approach within the network anomography framework [55] forms the matrix Ps a Y of link 
anomalies, thus exploiting the correlation between traffic across different links. Temporal approaches obtain 
link anomalies as YT instead, where T is a linear operator which judiciously filters the traffic time series 
per link (implementing an "anomaly-pass" filter). Several choices for T are proposed to this end, based 
on different forms of temporal analysis including autoregressive integrated moving average (ARIMA), 
wavelets, and fast Fourier transform (FFT). Different from [33], the inference algorithm in [55] capitalizes 
on the sparsity of A to estimate the anomaly map by e.g., solving in the spatial case 

A := argmin ||A||i, s. t. Ps a Y = RA. 

A 

Network anomography algorithms can be extended to accommodate routing changes across time; see [55] 
for further details and comprehensive performance tests. 

Recently, a natural estimator leveraging the low rank property of X and the sparsity of A was put 
forth in [38], which can be found at the crossroads of CS [23] and timely low -rank plus sparse matrix 
decompositions [10], [14]. The idea is to fit the incomplete data Vq(Y) to the model X + RA [cf. 
(9)] in the LS error sense, as well as minimize the rank of X, and the number of nonzero entries of 
A measured by its iV(P seu do) norm. Unfortunately, albeit natural both rank and £o- norm criteria are in 
general NP-hard to optimize. Typically, the nuclear norm ||X||* := Ylk a kO^) ( cr fe(^) denotes the A;-th 
singular value of X) and the ^i-norm ||A||i are adopted as surrogates [11], [25], since they are the closest 
convex approximants to rank(X) and ||A||o, respectively. Accordingly, one solves 

min ||^(Y-X-RA)||^ + A*||X||* + Ai||A||i (10) 

{X,A} 

where A*, Ai > are rank- and sparsity-controlling parameters. While a non-smooth optimization problem, 
being convex (10) is appealing. An efficient accelerated proximal gradient algorithm with quantifiable 
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iteration complexity was developed to unveil network anomalies [39]. Interestingly, (10) also offers a 
cleansed estimate of the link-level traffic X, that could be subsequently utilized for network tomography 
tasks. In addition, (10) jointly exploits the spatio-temporal correlations in the link traffic as well as the 
sparsity of the anomalies, through an optimal single-shot estimation-detection procedure that has been 
shown to outperform the algorithms in [33] and [55] (that decouple the estimation and detection steps). 




False-ralarm probability 



Fig. 6. Unveiling anomalies from Internet-2 data [1]. (Left) ROC curve comparison between (10) and the PCA methods in [33], 
[55], for different values of r := dim(iS n ). Leveraging sparsity and low rank jointly leads to improved performance. (Right) In 
red, the estimated anomaly map A obtained via (10) superimposed to the "true" anomalies shown in blue [37]. 

Before moving on to distributed implementations, it is instructive to elaborate on the generality of (10). 
When there is no missing data and X = OlxT, one is left with an under-determined sparse signal recovery 
problem typically encountered with CS; see e.g., [23]. The decomposition Y = X + A corresponds to 
principal component pursuit (PCP), also referred to as robust PCA [10], [14]. For the idealized noise-free 
setting (E = Olxt), sufficient conditions for exact recovery of the unknowns are available for both of the 
aforementioned special cases [10], [11], [14]. However, the superposition of a low-rank plus a compressed 
sparse matrix in (9) further challenges identifiability of {X,A}; see [39] for early results. Going back 
to the CS paradigm, even when X is nonzero one could envision a variant where the measurements are 
corrupted with correlated (low-rank) noise [15]. Last but not least, when A = OfxT and Y is noisy, 
the recovery of X subject to a rank constraint is nothing but PCA - arguably, the workhorse of high- 
dimensional data analytics. This same formulation is adopted for low-rank matrix completion, to impute 
the missing entries of a low-rank matrix observed in noise, i.e., Vq(Y) = Vq(X. + E) [12]. 

C. In-network distributed processing 

Implementing (10) presumes that network nodes continuously communicate their link traffic measure- 
ments to a central monitoring station, which uses their aggregation in 'Pq(Y) to unveil anomalies. While 
for the most part this is the prevailing operational paradigm adopted in current networks, it is fair to say 
there are limitations associated with this architecture. For instance, fusing all this information may entail 
excessive protocol overheads. Moreover, minimizing the exchanges of raw measurements may be desirable 
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to reduce unavoidable communication errors that translate to missing data. Solving (10) centrally raises 
robustness concerns as well, since the central monitoring station represents an isolated point of failure. 

These reasons motivate well devising fully-distributed iterative algorithms for dynamic anomalography, 
embedding the network anomaly detection functionality to the routers. In a nutshell, per iteration nodes 
n G M carry out simple computational tasks locally, relying on their own link count measurements (a 
submatrix Y n within Y = [Y^, . . . ,Y' N ]' corresponding to router n's links). Subsequently, local estimates 
are refined after exchanging messages only with directly connected neighbors, which facilitates percolation 
of local information to the whole network. The end goal is for network nodes to consent on a global map of 
network anomalies A, and attain (or at least come close to) the estimation performance of the centralized 
counterpart (10) which has all data Tn(Y) available. 

Problem (10) is not amenable for distributed implementation due to the non-separable nuclear norm 
present in the cost function. If an upper bound rank(X) < p is a priori available [recall X is the estimated 
link-level traffic obtained via (10)], (10)'s search space is effectively reduced and one can factorize the 
decision variable as X = PQ', where P and Q are L x p and T x p matrices, respectively. Again, it 
is possible to interpret the columns of X (viewed as points in R L ) as belonging to a low -rank nominal 
subspace S n , spanned by the columns of P. The rows of Q are thus the projections of the columns of X 
onto S n . Next, consider the following alternative characterization of the nuclear norm (see e.g. [46]) 

||X||„,:= min \ (\\P\\ 2 F + ||Q||f,) , s. t. X = PQ' (11) 
{P.Q} ^ 

where the optimization is over all possible bilinear factorizations of X, so that the number of columns p 
of P and Q is also a variable. Leveraging (11), the following reformulation of (10) provides an important 
first step towards obtaining a distributed anomalography algorithm 

N r a a i 

{pTa} ^ [l^ Y ™ - P " Q ' - R " A )I^ + 2^ W P »^ + IIQI1 ^) + aH |A||i J (12) 

which is non-convex due to the bilinear terms P„Q', and where R := [R^, . . . ,H' N ]' is partitioned into 
local routing tables available per router n. Adopting the separable Frobenius-norm regularization in (12) 
comes with no loss of optimality relative to (10), provided rank(X) < p. By finding the global minimum 
of (12) [which could have considerably less variables than (10)], one can recover the optimal solution of 
(10). But since (12) is non-convex, it may have stationary points which need not be globally optimum. As 
asserted in [38, Prop. 1] however, if a stationary point {P, Q, A} of (12) satisfies ((^(Y — PQ' — A)|| < 
A*, then {X := PQ', A := A} is the globally optimal solution of (10). Note that for sufficiently small p 
the residual ||Pn(Y — PQ' — A)|| becomes large, and the qualification inequality is violated [unless A* 
is large enough, in which case a sufficiently low -rank solution to (10) is expected]. The condition on the 
residual implicitly enforces rank(X) < p, which is necessary for the equivalence between (10) and (12). 

To decompose the cost in (12), in which summands inside the square brackets are coupled through the 
global variables {Q, A}, introduce auxiliary copies {Q„, A n }^ =1 representing local estimates of {Q, A}, 
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one per node n. These local copies along with consensus constraints yield the distributed estimator 

N 

A i 

(13) 



||Pn„(Y n - P n Q^ - R„A n )||| + A (7V||P n ||| + ||Q n ||f,) + ^||A n || 1 



min > 

{ P„,Q„,A„}^ 

s. t. Q„ = Q m , A n = A m m linked with n e Af 



which is equivalent to (12) provided the network topology graph is connected. Even though consensus is 
a fortiori imposed within neighborhoods, it extends to the whole (connected) network and local estimates 
agree on the global solution of (12). Exploiting the separable structure of (13), a general framework for 
in-network sparsity-regularized rank minimization was put forth in [38]. Specifically, distributed iterations 
were obtained after adopting the alternating-direction method of multipliers (ADMM), an iterative La- 
grangian method well-suited for parallel processing [9]. In a nutshell, local tasks per iteration k = 1,2,... 
entail solving small unconstrained quadratic programs to refine the normal subspace P n [fe], in addition to 
soft-thresholding operations to update the anomaly maps A n [k] per router. Each iteration, routers exchange 
their estimates {Q n [fc], A n [fc]} only with directly connected neighbors. This way the communication 
overhead remains affordable, and independent of the network size N. 

When employed to solve non-convex problems such as (13), so far ADMM offers no convergence guar- 
antees. However, there is ample experimental evidence in the literature that supports empirical convergence 
of ADMM, especially when the non-convex problem at hand exhibits "favorable" structure. For instance, 
(13) is a linearly constrained bi-convex problem with potentially good convergence properties - extensive 
numerical tests in [38] demonstrate that this is indeed the case. While establishing convergence remains 
an open problem, one can still prove that upon convergence the distributed iterations attain consensus and 
global optimality, offering the desirable centralized performance guarantees [38]. 

D. Real-time anomaly trackers 

Monitoring of large-scale IP networks necessitates massive recollection of data which far outweigh the 
ability of modern computers to store and analyze them in real time. In addition, nonstationarities due 
to routing changes and missing data further challenge identification of anomalies. In dynamic networks 
routing tables are constantly readjusted to effect traffic load balancing and avoid congestion caused by 
e.g., traffic anomalies. To account for slowly time-varing routing tables, let H t G ~R LxF denote the 
routing matrix at time t. In this dynamic setting, the partially observed link counts at time t adhere to 
'Po.tiyt) = ~Pn t (^t + R*a< + e t ), t = 1,2,..., where the link-level traffic x t := R t z t . In general, routing 
changes may alter a link load considerably by e.g., routing traffic completely away from a specific link. 
Therefore, even though the OD flow vectors {z t } live in a low-dimensional subspace, the same may not 
be true for the {x t } when the routing updates are major and frequent. In backbone networks however, 
routing changes are sporadic relative to the time-scale of data acquisition used for network monitoring 
tasks. For example, data collected from the operation of Internet-2 network reveals that only a few rows 
of Ri change per week [1]. It is thus safe to assume that {x t } still lies in a low-dimensional subspace, 
and exploit the spatio-temporal correlations of the observations to identify the anomalies in real-time. 
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On top of the previous arguments, in practice link measurements are acquired sequentially in time, 
which motivates updating previously obtained estimates rather than re-computing new ones from scratch 
each time a new datum becomes available. The goal is then to recursively estimate {x^,a f } at time t 
from historical observations {Va T (yr)}t=i> naturally placing more importance on recent measurements. 
To this end, one possible adaptive counterpart to (12) is the exponentially-weighted LS estimator found 
by minimizing the empirical cost [37] 



t 

min V/3*~ T 
{P,Q,A} ^ 



T=l 



11+ 



^* ||p||2 , ^* ii ||2 , x || || 

,i ^W P \\f + yNrlb + Ai||a T ||i 



(14) 



2EU/3* 

in which < /3 < 1 is the so-termed forgetting factor. When j3 < 1 data in the distant past are exponentially 
downweighted, which facilitates tracking network anomalies in nonstationary environments. For static 
routing (R t = R) and infinite memory (/3 = 1), the formulation (14) coincides with the batch estimator 
(12). A provably convergent online algorithm for dynamic anomalography is developed in [37], based 
on alternating minimization of (14); see Fig. 7. Each time a new datum is acquired, anomaly estimates 
are formed via the Lasso [51], and the low -rank nominal traffic subspace is refined using recursive LS. 
For situations were reducing computational complexity is critical, an online stochastic gradient algorithm 
based on Nesterov's acceleration technique is developed as well [37]. 
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Fig. 7. Unveiling anomalies in real time from Intemet-2 data [1]. (Left) Measured link traffic and cleansed estimates for three 
representative links; and (Right) three rows of the estimated anomaly map A corresponding to three anomalous flows [37]. 

Algorithms in [37] are closely related to timely robust subspace trackers, which aim at estimating a low- 
rank subspace P from grossly corrupted and possibly incomplete data, namely Vu t (yt) = Vn t (Pq< +&t + 
e t ), t = 1, 2, .... In the absence of sparse "outliers" {a t }uLi> an online algorithm based on incremental 
gradient descent on the Grassmannian manifold of subspaces was put forth in [4]. The second-order RLS- 
type algorithm in [16] extends the seminal projection approximation subspace tracking (PAST) algorithm 
to handle missing data. When outliers are present, robust counterparts can be found in [15], [28]. Relative 
to all aforementioned works, the estimation problem (14) is more challenging due to the presence of the 
(compression) routing matrix H t ; see [39] for fundamental identifiability issues related to the model (9). 
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IV. Broadening the network atlas 

Additional cartography instances are outlined in this section, including anomalography from flow 
measurements and network distance prediction. To exemplify the development of sensing infrastructure for 
situational awareness at the physical layer of wireless CR networks, the notion of RF cartography is intro- 
duced as well. All these problems can be tackled through SP methods subsumed by (10), namely PCP [14], 
low -rank matrix completion [12], the Lasso [51], and non-parametric versions of basis pursuit [7]. 

A. Unveiling anomalies from flow data 

Since some networks nowadays collect OD flow (not link-level) measurements Zf jt +a,f :t for at least part 
of their network (using e.g., the Netflow protocol), anomalies can be detected using temporal decomposition 
and standard change-detection approaches per flow. Leveraging the low-rank property of the traffic matrix 
and the sparsity of anomalies, anomalography from OD flow measurements was formulated as the PCP 
matrix decomposition problem and solved centrally in [3]; see also [38] for a distributed implementation 
of the PCP estimator aimed at scalable monitoring of networks. 

B. Network distance prediction 

End-to-end network distance information is critical towards enhancing QoS in Internet applications 
such as content distribution and peer-to-peer file sharing systems. Clients naturally prefer to establish 
connections with "closer" network resources or servers that are likely to respond faster. There are different 
metrics to quantify the distance between a pair of network nodes. The most common choices are defined 
in terms of latency (one-way delay and the so-termed round-trip time) or router hop-counts. Unfortunately, 
either probing or passively measuring all pairwise distances becomes infeasible in large-scale networks. 
Given those few affordable distance measurements, the problem of network distance prediction is to impute 
(that is interpolate) the missing entries in a highly-incomplete matrix of end-to-end distances. 

If one collects the end-to-end latencies dij of source-sink pairs in a delay matrix D := [ckj] € 
M ArxAr , strong dependencies among path delays render D low rank; see e.g., [35] for an experimental 
validation with multiple datasets. Intuitively, correlations among rows and columns of D emerge because 
nearby nodes (e.g., those belonging to a common subnetwork) are connected to every other node through 
paths with significant overlap, possibly sharing common bottleneck links. The low-rank property of D 
along with the distributed-processing requirements of large-scale networks, motivated decentralized matrix- 
factorization [35] and nuclear-norm minimization [38] algorithms for network distance prediction. Different 
from schemes based on Euclidean embedding via multi-dimensional scaling [22], low -rank modeling does 
not require distances in D to be symmetric and satisfy the triangle inequality - properties that are oftentimes 
violated by network-related distances [34]. 

To avoid the excessive overhead of active probing mechanisms, one can leverage network monitors 
that passively observe router hop-counts from traffic traversing those monitored links; see e.g., [24] and 
references therein. Collect these hop-count measurements in the matrix H := [/i mi n ] G N MxJV , where M 
is the number of monitors, and N (S> M) the total hosts observed. Because monitor m only observes 
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a fraction of the total network traffic, H will be depleted with missing entries. Despite typically having 
rank(H) = M, H consists of low-rank column blocks, each corresponding to a subnetwork with access to 
the Internet core through a single border router. Recognizing this structure, a high-rank matrix completion 
algorithm that performs subspace clustering of incomplete hop-count data was put forth in [24], and shown 
to attain good performance both in theory and practice. 

Different from the dynamic network delay cartography problem considered in Sec. II-B, network distance 
prediction approaches do not account for the temporal variations in the delays, and typically rely on batch 
imputation of the distance matrix of interest. The techniques used in Sec. II-B do not apply in this 
context either, since some path delays are never observed, and thus it is impossible to estimate the spatial 
covariance matrices (such as C v and C„) completely. 

C. RF cartography 

In the domain of spectrum sensing for CR networks, RF cartography amounts to constructing in a 
distributed fashion: ml) global power spectral density (PSD) maps capturing the distribution of radiated 
power across space, time, and frequency; and m2) local channel gain (CG) maps offering the propagation 
medium per frequency from each node to any point in space. These maps enable identification of oppor- 
tunistically available spectrum bands for re-use and handoff operation; as well as localization, transmit- 
power estimation, and tracking of primary user activities. While the focus here is on the construction of 
PSD maps, the interested reader is referred to [29] for a tutorial treatment on CG cartography. 

A cooperative approach to RF cartography was introduced in [6], that builds on a basis expansion 
model of the PSD map <J>(x, /) across space x <G M. 2 , and frequency /. Spatially-distributed CRs collect 
smoothed periodogram samples of the received signal at given sampling frequencies, based on which they 
want to determine the unknown expansion coefficients. Introducing a virtual spatial grid of candidate source 
locations, the estimation task can be cast as a linear LS problem with an augmented vector of unknown 
parameters. Still, the problem complexity (or effective degrees of freedom) can be controlled by capitalizing 
on two forms of sparsity: the first one introduced by the narrow-band nature of transmit-PSDs relative to 
the broad swaths of usable spectrum; and the second one emerging from sparsely located active radios in 
the operational space (due to the grid artifact). Nonzero entries in the parameter vector sought correspond 
to spatial location-frequency band pairs corresponding to active transmissions. All in all, estimating the 
PSD map and locating the active transmitters as a byproduct boils down to a variable selection problem. 
This motivates well employment of the Lasso for distributed sparse linear regression [38], an estimator 
also subsumed by (10) when X = OlxT, T = 1, and the regression matrix R has a specific structure that 
depends on the chosen bases and path-loss propagation model. 

Sparse total LS variants are also available to cope with uncertainty in the regression matrix, arising due 
to inaccurate channel estimation and grid-mismatch effects [29]. Nonparametric spline-based PSD map 
estimators [7] have been also shown effective in capturing general propagation characteristics including 
both shadowing and fading; see also Fig. 8 for an actual PSD atlas spanning 14 frequency sub-bands. 
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Fig. 8. Spline-based RF cartography using the dataset [31]. (Left) Detailed floor plan schematic including the location of 
N — 166 sensing radios; (Right-bottom) original measurements spanning 14 frequency sub-bands; (Right-center) estimated maps 
over the surveyed area; and (Right-top) extrapolated maps. The proposed estimator is capable of recovering the 9 (out of 14 
total) center frequencies that are being utilized for transmission. It accurately recovers the power levels in the surveyed area with 
a smooth extrapolation to zones were there are no measurements, and suggests possible locations for the transmitters [7]. 

V. Concluding remarks 

In this tutorial, the concept of dynamic network cartography is introduced as a framework to construct 
maps of the dynamically evolving network state, in an efficient and scalable manner even for large-scale 
heterogeneous networks. Focus is placed on key tasks geared to obtaining full yet succinct representation 
of network state metrics such as link traffic and path delays, as well as prompt and accurate identification 
of network anomalies from possibly partial and corrupted measurement data. 

Looking forward, the unceasing demand for continuous situational awareness calls for innovative and 
large-scale distributed SP algorithms, complemented by collaborative and adaptive monitoring platforms 
to accomplish the objectives of network management and control. Avenues where significant impact can 
be made include: i) judicious design of critical cognition infrastructure to sense, learn, and adapt to the 
environment where networks operate; ii) development of scalable tools for distilling, summarizing, and 
tracking the network state for the purpose of network management; iii) ensuring robustness in the face of 
missing and grossly-corrupted network data, in addition to possibly malicious attacks; and iv) developing 
effective network adaptation techniques based on global network inference, further impacting protocol 
designs, network taxonomy, and categorization. 
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